In this paper, we looked at the patent prior art search from a term
selection perspective.  While previous works proposed different
solutions to improve retrieval effectiveness, we focused on term
analysis of the patent query and top-100 retrieved patents.  After
defining an oracular query based on relevance judgements, we
established both the sufficiency of the standard LM retrieval scoring
models and query reduction methods to achieve state-of-the-art patent
prior art search performance.  After finding that automated methods
for query reduction approaches fail to offer significant performance
improvements,
%examined the most obvious features such as: document frequent words, query frequent words, IPC definition words, and pseudo relevance feedback that might correlate RF score for terms in top retrieved documents. We showed that these feature helps very little because they are a complicated mixture of useful terms and noisy words that can not be separated easily. 
we showed that we can double the MAP with minimum user interaction by
approximating the oracular query through a relevance feedback approach
with a single relevant document.  Given that such simple interactive
methods for query reduction with a standard LM retrieval model
outperform highly engineered patent-specific search systems from
CLEF-IP 2010, we concluded that interactive methods offer a promising
avenue for simple but highly effective term selection in patent prior
art search.

% Future work not needed and in fact discouraged for a poster paper.  Finish on your
% key take-home point.  -Scott

%For future works, we plan to analyse more features which are independent from the relevance feedback but correlate with RF score. Inspired by some excellent works proposing query reduction and term selection techniques for the long non-patent queries\cite{maxwell2013compact}\cite{kumaran2009reducing}, we are also going to apply them for patent retrieval.   


%Opposite our initial assumption, features such as document frequent words, frequent words in the query, IPC code definition words, and pseudo relevance feedback could not help to refine the best query because they were the combination of useful words and noisy words and our system is too sensitive to the existence of the noisy words as well removing . 
